Pursuing the Goal of Language Understanding
نویسندگان
چکیده
No human being can understand every text or dialog in his or her native language, and no one should expect a computer to do so. However, people have a remarkable ability to learn and to extend their understanding without explicit training. Fundamental to human understanding is the ability to learn and use language in social interactions that Wittgenstein called language games. Those language games use and extend prelinguistic knowledge learned through perception, action, and social interactions. This article surveys the technology developed for natural language processing and the successes and failures of various attempts. Although many useful applications have been implemented, the original goal of language understanding seems as remote as ever. Fundamental to understanding is the ability to recognize an utterance as a move in a social game and to respond in terms of a mental model of the game, the players, and the environment. Those models use and extend the prelinguistic models learned through perception, action, and social interactions. Secondary uses of language, such as reading a book, are derivative processes that elaborate and extend the mental models originally acquired by interacting with people and the environment. A computer system that relates language to virtual models might mimic some aspects of understanding, but full understanding requires the ability to learn and use new knowledge in social and sensory-motor interactions. These issues are illustrated with an analysis of some NLP systems and a recommended strategy for the future. None of the systems available today can understand language at the level of a child, but with a shift in strategy there is hope of designing more robust and usable systems in the future. This is a slightly revised and extended version of a paper in the Proceedings of the 16th ICCS, edited by P. Eklund and O. Haemmerlé, LNAI 5113, Springer, Berlin, 2008, pp. 21-42. 1. The Goal of Language Understanding Some early successes of artificial intelligence led to exaggerated expectations. One example was the theorem prover by Hao Wang (1960), which proved the first 378 theorems of the Principia Mathematica in 7 minutes — an average of 1.1 seconds per theorem on the IBM 704, a vacuum-tube machine with 144K bytes of storage. Since that speed was much faster than the two brilliant logicians who wrote the book, pioneers in AI thought that simulating human intelligence would be easy. For machine translation, Delavenay (1960) claimed “While a great deal remains to be done, it can be stated without hesitation that the essential has already been accomplished.” Good (1965) predicted “It is more probable than not that, within the twentieth century, an ultraintelligent machine will be built and that it will be the last invention that man need make.” The movie 2001, which appeared in 1968, featured the HAL 9000, an intelligent computer that could carry on a conversation in flawless English and even read lips when the humans were trying to communicate in secret. Marvin Minsky, a technical advisor on that movie, claimed it was a “conservative” estimate of AI technology at the end of the 20th century. Yet mathematical tasks, such as proving theorems or playing chess, turned out to be far easier to process by computer than simulating the language skills of a three-year-old child. In chess or mathematics, a computer can exceed human abilities without simulating human thought. But language is so intimately tied to thought that a computer probably cannot understand language without simulating human thinking at some level. That point raises many serious questions: At what level? With what theory of thinking? With what kinds of internal mechanisms? And with what theories and mechanisms for relating the internal processes via the sensory-motor systems to other agents and the world? Several kinds of theories have been proposed, analyzed, and discussed since antiquity: thoughts are images, thoughts are feelings, thoughts are propositions, and thoughts are multimodal combinations of images, feelings, and propositions. The propositional theory has been the most popular in AI, partly because it’s compatible with a large body of work in logic and partly because it’s the easiest to implement on a digital computer. Figure 1 illustrates the classical paradigm for natural language processing. At the top is a lexicon that maps the vocabulary to speech sounds, word forms, grammar, and word senses. The arrows from left to right link each stage of processing: phonology maps the speech sounds to phonemes; morphology relates the phonemes to meaningful units or morphemes; syntax analyzes a string of morphemes according to grammar rules; and semantics interprets the grammatical patterns to generate propositions stated in some version of logic. Figure 1. Classical stages in natural language processing Psycholinguistic evidence since the 1960s has shown that Figure 1 is unrealistic. All the one-way arrows should be double headed, because feedback from later stages has a major influence on processing at earlier stages. Even the arrows from the lexicon should be double headed, because people are constantly learning and coining new words, new word senses, and new variations in syntax and pronunciation. The output labeled logic is also unrealistic, because logicians have not reached a consensus on an ideal logical form and many linguists doubt that logic is an ideal representation for semantics. Furthermore, Figure 1 omits everything about how language is used by people who interact with each other and the world. Figure 2 is a more realistic diagram of the interconnections among the modules. Figure 2. A more realistic diagram of interconnections Yet Figure 2 also embodies questionable assumptions. The box labeled perception, action, and emotion, for example, blurs all the levels of cognition from fish to chimpanzees. Furthermore, the boxes of Figure 2 correspond to traditional academic fields, but there is no evidence that those fields have a one-to-one mapping to modules for processing language in the brain. In particular, the box labeled knowledge should be subdivided in at least three ways: language-independent knowledge stored in image-like form; conceptual knowledge related to language, but independent of any specific language; and knowledge of the phonology, vocabulary, and syntax of specific languages. The box labeled pragmatics deals with the use of language in human activities. Wittgenstein (1953) proposed a reorganization in language games, according to the open-ended variety of ways language is used in social interactions. That subdivision would cause a similar partitioning of the other boxes, especially semantics, knowledge, and the lexicon. It would also affect the variations of syntax and phonology in casual speech, professional jargon, or “baby talk” with an infant. In his first book, Wittgenstein (1921) presented a theory of language and logic based on principles proposed by his mentors, Frege and Russell. Believing he had solved all the problems of philosophy, Wittgenstein retired to an Austrian mountain village, where he taught elementary schoolchildren. Unfortunately, the children did not learn, think, or speak according to those principles. In his second book, Wittgenstein (1953) systematically analyzed the “grave errors” (schwere Irrtümer) in the framework he had adopted. One of the worst was the view that logic is superior to natural languages and should replace them for scientific purposes. Frege (1879), for example, hoped “to break the domination of the word over the human spirit by laying bare the misconceptions that through the use of language often almost unavoidably arise concerning the relations between concepts.” Russell shared Frege’s low opinion of natural language, and both of them inspired Carnap, the Vienna Circle, and most of analytic philosophy. Many linguists and logicians who work within the paradigm of Figure 1 admit that it’s oversimplified, but they claim that simplification is necessary to enable researchers to address solvable subproblems. Yet Richard Montague and his followers have spent forty years working in that paradigm, and computational linguists have been working on it for half a century. But the goal of designing a system at the level of HAL 9000 seems more remote today than in 1968. Even pioneers in the logic-based approach have begun to doubt its adequacy. Kamp (2001), for example, claimed “that the basic concepts of linguistics — and especially those of semantics — have to be thought through anew” and “that many more distinctions have to be drawn than are dreamt of in current semantic theory.” This article emphasizes the distinctions that were dreamt of and developed by cognitive scientists who corrected or rejected the assumptions by Frege, Russell, and their followers. Section 2 begins with the semeiotic by Charles Sanders Peirce, who had invented the algebraic notation for logic, but who placed it in a broader framework than the 20th-century logicians who used it. Section 3 discusses the ubiquitous pattern matching in every aspect of cognition and its use in logical and analogical reasoning. Section 4 presents Wittgenstein’s language games and the social interactions in which language is learned, used, and understood. Section 5 introduces Minsky’s Society of Mind as a method of supporting the interactions illustrated in Figure 2. Section 6 summarizes the lessons learned from work with two earlier language processors. The concluding Section 7 outlines a multilevel approach to language processing that can support more robust and flexible systems. 2. Semeiotic and Biosemiotics Peirce claimed that the primary characteristic of life is the ability to recognize, interpret, and respond to signs. Signs are even more fundamental than neurons because every neuron is itself a semiotic system: it receives signs and interprets them by generating more signs, which it passes to other neurons or muscle cells. Every cell, even an independent bacterium, is a semiotic system that recognizes chemical, electrical, or tactile signs and interprets them by generating other signs. Those signs can cause the walls of a bacterial cell to contract or expand and move the cell toward nutrients and away from toxins. The brain is a large colony of neural cells, whose signs coordinate a symbiotic relationship within an organism of many kinds of cells. The neural system supports rapid, long-distance communication by electrical signs, but all cells can communicate locally by chemical signs. By secreting chemicals into the blood stream, cells can broadcast signs by a slower, but more pervasive method. At every level from a single cell to a multicellular organism to a society of organisms, signs support and direct all vital processes. Semeiotic is Peirce’s term for the theory of signs. The modern term biosemiotics emphasizes Peirce’s point that sign processing is more general than human language and cognition. Figure 3. An evolutionary view of the language modules Deacon (1997), a professional neuroscientist, used Peirce’s theories as a guide for relating neurons to language. Figure 3 illustrates his view that the language modules of the brain are a recent addition and extension of a much older ape-like architecture. Deacon used Peirce’s categories of icon, index, and symbol to analyze the signs that animals recognize or produce. The calls a hunter utters to control the dogs are indexes, the vocal equivalent of a pointing finger. Vervet monkeys have three types of warning calls: one for eagles, another for snakes, and a third for leopards. Some people suggested that those calls are symbols of different types of animals, but vervets never use them in the absence of the stimulus. More likely, the vervet that sees the stimulus uses the call as an index to tell other vervets to look up, look down, or look around. An early step from index to symbol probably occurred when some hominin proposed a hunt by uttering an index for prey, even before the prey was present. After symbols became common, they would enable planning and organized activities in every aspect of life. The result would be a rapid increase in vocabulary, which would promote the co-evolution of language, brain, vocal tract, and culture. Like Frege, Peirce was a logician who independently developed a complete notation for first-order logic. Unlike Frege, Peirce had a high regard for the power and flexibility of language, and he had worked as an associate editor of the Century Dictionary, for which he wrote, revised, or reviewed over 16,000 definitions. Peirce never rejected language or logic, but he situated both within the broader theory of signs. In his semeiotic, every sign is a triad that relates a perceptible mark (1), to another sign called its interpretant (2), which determines an existing or intended object (3). Following is one of Peirce’s most often quoted definitions: A sign, or representamen, is something which stands to somebody for something in some respect or capacity. It addresses somebody, that is, creates in the mind of that person an equivalent sign, or perhaps a more developed sign. That sign which it creates I call the interpretant of the first sign. The sign stands for something, its object. It stands for that object, not in all respects, but in reference to a sort of idea, which I have sometimes called the ground of the representamen. (CP 2.228) A pattern of green and yellow in the lawn, for example, is a mark, and the interpretant is some type, such as Plant, Weed, Flower, SaladGreen, or Dandelion. The guiding idea that determines the interpretant depends on the context and the intentions of the observer. The interpretant determines the word the observer chooses to express the experience. The listener who hears that word uses background knowledge to derive an equivalent interpretant. As Peirce noted, an expert with a richer background can sometimes derive a more developed interpretant than the original observer. Mohanty (1982:58) remarked “Not unlike Frege, Husserl would rather eliminate such fluctuations from scientific discourse, but both are forced to recognize their recalcitrant character for their theories and indispensability for natural languages.” Communication in which both sides have identical interpretants is possible with computer systems. Formal languages are precise, but they are rigid and fragile. The slightest error can and frequently does cause a total breakdown, such as the notorious “blue screen of death.” On the surface, Peirce’s triads seem similar to the meaning triangles by Aristotle, Frege, or Ogden and Richards (1923). The crucial difference is that Peirce analyzed the underlying relationships among the vertices and sides of the triangle. By analyzing the relation between the mark and its object, Peirce (1867) derived the triad of icon, index and symbol: an icon refers by some similarity to the object; an index refers by a physical effect or connection; and a symbol refers by a law, habit, or convention. Figure 4 shows this relational triad in the middle row. Figure 4. Peirce’s triple trichotomy Later, Peirce added the first row or material triad, which signifies by the nature of the mark itself. The third row or formal triad signifies by a formal rule that relates all three vertices — the mark, interpretant, and object. The basic units of language are characterized by the formal triad: a word serves as a rheme; a sentence, as a dicent sign; and a paragraph or other sequence, as an argument. The labels at the top of Figure 4 indicate how the sign directs attention to the object: by some quality of the mark, by some causal or pointing effect, or by some mediating law, habit, or convention. The following examples illustrate nine types of signs: 1. Qualisign (material quality). A ringing sound as an uninterpreted sensation. 2. Sinsign (material indexicality). A ringing sound that is recognized as coming from a telephone. 3. Legisign (material mediation). The convention that a ringing telephone means someone is trying to call. 4. Icon (relational quality). An image that resembles a telephone when used to indicate a telephone. 5. Index (relational indexicality). A finger pointing toward a telephone. 6. Symbol (relational mediation). A ringing sound on the radio that is used to suggest a telephone call. 7. Rheme (formal quality). A word, such as telephone, which can represent any telephone, real or imagined. 8. Dicent Sign (formal indexicality). A sentence that asserts an actual existence of some object or event: “You have a phone call from your mother.” 9. Argument (formal mediation). A sequence of dicent signs that expresses a lawlike connection: “It may be an emergency. Therefore, you should answer the phone.” The nine categories in Figure 4 are more finely differentiated than most definitions of signs, and they cover a broader range of phenomena. Anything that exists can be a sign of itself (sinsign), if it is interpreted by an observer. But Peirce (1911:33) did not limit his definition to human minds or even to signs that exist in our universe: A sign, then, is anything whatsoever — whether an Actual or a May-be or a Would-be — which affects a mind, its Interpreter, and draws that interpreter’s attention to some Object (whether Actual, May-be, or Would-be) which has already come within the sphere of his experience. The mind or quasi-mind that interprets a sign need not be human. In various examples, Peirce mentioned dogs, parrots, and bees. Higher animals typically recognize icons and indexes, and some might recognize symbols. A language of some kind is a prerequisite for signs at the formal level of rhemes, dicent signs, and arguments. As these examples show, Peirce’s theory of signs provides a more nuanced basis for analysis than the all-or-nothing question of whether animals have language. Unlike the static meaning triangles of Aristotle or Frege, the most important aspect of Peirce’s triangles is their dynamic nature: any vertex can spawn another triad to show three different perspectives on the entity represented by that vertex. During the course of a conversation, the motives of the participants lead the thread of themes and topics from triangle to triangle. 3. Perception, Cognition, and Reasoning Language affects and is affected by every aspect of cognition. Only one topic is more pervasive than language: signs in general. Every cell of every organism is a semiotic system, which receives signs from the environment, including other cells, and interprets them by generating more signs, both to control its own inner workings and to communicate with other cells of the same organism or different organisms. The brain is a large colony of neural cells, which receives, generates, and transmits signs to other cells of the organism, which is an even larger colony. Every publication in neuroscience describes brains and neurons as systems that receive signs, process signs, and generate signs. Every attempt to understand those signs relates them to other signs from the environment, to signs generated by the organism, and to theories of those signs in other branches of cognitive science. The meaning of the neural signs can only be determined by situating neuroscience within a more complete theory that encompasses every aspect of cognitive science. By Peirce’s definition of sign, all life processes, especially cognition, involve receiving, interpreting, generating, storing, and transmitting signs and patterns of signs. Experimental evidence is necessary to determine the nature of the signs and the kinds of patterns generated by the interpretation. Perceptual signs are icons derived from sensory stimulation caused by the outside world or caused by internal bodily processes. Recognition consists of interpreting a newly received icon by matching it to previously classified icons called percepts and patterns of percepts called Gestalts. The interpretation of an icon is the pattern formed by the percepts, Gestalts, and other associated signs. The interpreting signs may be image-like percepts or imageless concepts, which are similar to percepts, but without the sensory connections. Analogy is a method of reasoning based on pattern matching, and every method of logic is a constrained use of analogy. As an example, consider the rule of deduction called modus ponens: Premise: If P then Q. Assertion: P′. Conclusion: Q′. This rule depends on the most basic form of pattern matching: a comparison of P and P′ to determine whether they are identical. If P in the premise is not identical to P′ in the assertion, then a patternmatching process called unification specializes P by some transformation S that makes S(P) identical to P′. By applying the same specialization S to Q, the conclusion Q′ is derived as S(Q). Each of the following three methods of logic constrain the pattern matching to specialization, generalization, or identity. 1. Deduction. Specialize a general principle. Known: Every bird flies. Given: Tweety is a bird. Infer: Tweety flies. 2. Induction. Generalize multiple special cases: Given: Tweety is a bird. Polly is a bird. Hooty is a bird. Tweety flies. Polly flies. Hooty flies. Assume: Every bird flies. 3. Abduction. Given a special case and a known generalization, make a guess that explains the special case. Given: Tweety flies. Known: Every bird flies. Guess: Tweety is a bird. These three methods of logic depend on the ability to use symbols. In deduction, the general term every bird is replaced by the name of a specific bird Tweety. Induction generalizes a property of multiple individuals — Tweety, Polly, and Hooty — to the category Bird, which subsumes all the instances. Abduction guesses the new proposition Tweety is a bird to explain one or more observations. According to Deacon’s hypothesis that symbols are uniquely human, these three reasoning methods could not be used by nonhuman mammals. According to Peirce (1902), “Besides these three types of reasoning there is a fourth, analogy, which combines the characters of the three, yet cannot be adequately represented as composite.” Its only prerequisite is stimulus generalization — the ability to classify similar patterns of stimuli as signs of similar objects or events. Unlike the more constrained operations of generalization and specialization, similarity may involve a generalization of one part and a specialization of another part of the same pattern. Analogy is more primitive than logic because it does not require language or symbols. In Peirce’s terms, logic requires symbols, but analogy can also be performed on image-like icons. Casebased reasoning (CBR) is an informal method of reasoning, which uses analogy to find and compare cases that may be relevant to a given problem or question. Whether the medium consists of discrete words or continuous images, CBR methods start with a question or goal Q about some current problem or situation P. By analogy, cases that resemble P are recalled from long-term memory and ranked according to their similarity to P. The case with the greatest similarity (i.e., smallest semantic distance) is the most likely to answer the question Q. When a similar case is found, the part of the case that matches Q is the predicted answer. If two or more cases are similar to P, they might not predict the same answer. If they do, that answer can be accepted with a high degree of confidence. If not, multiple cases can be combined by some transformation: a disjunction (Q1 or Q2), a generalization of Q1 and Q2, or a blend of features from both. A semantic distance measure could be used to choose the most appropriate transformation by comparing the results with typical examples in the knowledge base. Both logic and CBR have a large overlap on which they’re compatible: they would generate consistent responses to the same questions. For highly regular data, induction can generalize many cases to rules of the form If P, then Q. For such data, CBR would derive the same conclusions as a method of deduction called backward chaining: a goal Q′ is unified to the conclusion Q of some if-then rule by means of a specialization S; the application of S to P produces the pattern P′, which is a generalization of one or more cases. Formal deduction is best suited to thoroughly analyzed areas of science, for which induction can reduce a large number of cases to a small number of rules. CBR is most valuable for subjects with highly varied or frequently changing cases, for which any axioms would have a long list of exceptions. In legal reasoning, for example, the list of laws and the list of cases are enormous, and nearly every generalization has as many exceptions as applications. For both formal and informal reasoning, a high-speed method of indexing and finding relevant data is essential, but discrete list-processing methods have been too slow. The world is continuous, all physical motions are continuous, feelings and sensations vary continuously, but every natural language has a discrete, finite set of meaningful units or morphemes. No discrete set of symbols can faithfully represent a continuous world, but a cognitive system must map discrete words to and from continuous sensations. Wildgen (1982, 1994) maintained that continuous fields are the primary basis for perception and cognition, and he adopted René Thom’s catastrophe-theoretic semantics for identifying the patterns that map to the discrete words and phrases. That approach is still controversial, but the principle of mapping discrete structures such as conceptual graphs (CGs) to continuous fields has proved to be valuable for developing efficient methods for indexing CGs and computing the semantic distance between them (Sowa & Majumdar 2003). Those methods were used for finding analogies by the VivoMind Analogy Engine (VAE), and more precise and flexible mappings have been implemented in a new system called Cognitive MemoryTM. This system is based on active agents, as discussed in Section 5, and it encodes arbitrarily large conceptual graphs in Cognitive SignaturesTM, which are mathematical structures embedded in a continuous field. Psychologically, those signatures represent chunks of knowledge that can be related to other chunks by high-speed numeric computations.
منابع مشابه
Language understanding component for Chinese dialogue system
In this paper we present the design and the implement of the language understanding component of a Chinese spoken language dialogue system EasyNav . In pursuing the coherence with the goal of understanding, we design the structure of system with speech decoding and language understanding integrated closely. Thus the language understanding component need to be restrictive besides portable. Actua...
متن کاملDeath Thought in Samuel Beckett’s Works
One of the dominant acts of Samuel Beckett’s characters is their “death thought” or “deathliness.” They are either waiting to die or want to commit suicide. Disappointed with the proceedings of their life in modern society and to free themselves from deep crisis, these characters prefer death to life. Since ancient Greek tragedies, death has always been a basic theme in the works of many writer...
متن کاملStudents’ Perceived Classroom Climate and Their Achievement Goal Orientations in an Iranian EFL Context
Classroom climate has been demonstrated to associate with individuals’ various attributes and outcomes. Recent research has also confirmed students’ goal orientations deserving to be recognized as a significant achievement-related outcome. In this line, the current study intended to examine the relationship between Iranian EFL students’ perceptions of their classroom environment and their achie...
متن کاملStudents’ Perceived Classroom Climate and Their Achievement Goal Orientations in an Iranian EFL Context
Classroom climate has been demonstrated to associate with individuals’ various attributes and outcomes. Recent research has also confirmed students’ goal orientations deserving to be recognized as a significant achievement-related outcome. In this line, the current study intended to examine the relationship between Iranian EFL students’ perceptions of their classroom environment and their achie...
متن کاملThe Concept of Educational Culture Language: A Case of Labeling
The verbal interaction among students, teachers, parents, and school administrators plays a significant role in achieving educational goals as the language used in educational settings while mirroring the dominant educational culture, functions as an important tool in shaping and reshaping their beliefs. Thus, the culture language in any educational setting needs to be studied in order to ident...
متن کاملINVESTIGATING L2 TEACHERS’ PEDAGOGICAL SUCCESS: THE ROLE OF SPIRITUAL INTELLIGENCE
Teachers can influence the complex process of learning in education, in general, and in second/foreign language (L2) learning in particular. In this light, understanding the factors influencing teachers’ pedagogical success can help L2 teachers achieve more effective teaching. This study then investigated the role of spiritual intelligence (SI) in L2 teachers’ pedagogical success. In so doing, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008